Multiword Expression Filtering for Building Knowledge Maps

نویسنده

  • Shailaja Venkatsubramanyan
چکیده

This paper describes an algorithm that can be used to improve the quality of multiword expressions extracted from documents. We measure multiword expression quality by the “usefulness” of a multiword expression in helping ontologists build knowledge maps that allow users to search a large document corpus. Our stopword based algorithm takes n-grams extracted from documents, and cleans them up to make them more suitable for building knowledge maps. Running our algorithm on large corpora of documents has shown that it helps to increase the percentage of useful terms from 40% to 70% – with an eight-fold improvement observed in some cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiword Expression Filtering For Building Knowledge

This paper describes an algorithm that can be used to improve the quality of multiword expressions extracted from documents. We measure multiword expression quality by the “usefulness” of a multiword expression in helping ontologists build knowledge maps that allow users to search a large document corpus. Our stopword based algorithm takes ngrams extracted from documents, and cleans them up to ...

متن کامل

$xwrpdwlf 'lvfryhu\ Dqg $jjuhjdwlrq Ri &rpsrxqg 1dphv Iru Wkh 8vh Lq .qrzohgjh 5hsuhvhqwdwlrqv

$EVWUDFW Automatic acquisition of information structures like Topic Maps or semantic networks from large document collections is an important issue in knowledge management. An inherent problem with automatic approaches is the treatment of multiword terms as single semantic entities. Taking company names as an example, we present a method for learning multiword terms from large text corpora expl...

متن کامل

$xwrpdwlff'lvfryhu\dqgg$jjuhjdwlrqqrii&rpsrxqgg 1dphvviruuwkhh8vhhlq.qrzohgjhh5hsuhvhqwdwlrqvv

Automatic acquisition of information structures like Topic Maps or semantic networks from large document collections is an important issue in knowledge management. An inherent problem with automatic approaches is the treatment of multiword terms as single semantic entities. Taking company names as an example, we present a method for learning multiword terms from large text corpora exploiting th...

متن کامل

Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies

In this paper, we present an algorithm for extracting translations of any given multiword expression from parallel corpora. Given a multiword expression to be translated, the method involves extracting a short list of target candidate words from parallel corpora based on scores of normalized frequency, generating possible translations and filtering out common subsequences, and selecting the top...

متن کامل

Multiword Sequences as Building Blocks for Language: Insights into First and Second Language Learning

Many grammatical frameworks view words and rules as the basic building blocks of language, with multiword sequences being treated as peripheral exceptions in the form of idioms, etc. (e.g., Pinker, 1999). The new millennium, however, has seen a shift toward construing multiword sequences not as linguistic rarities but as important building blocks for language acquisition and processing. Based o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004